{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# COMPSCI 389: Introduction to Machine Learning\n",
    "# Topic 0.1: Python and Jupyter Notebooks\n",
    "\n",
    "In this course we will use Python, and specifically Jupyter Notebooks like this one. This notebook provides a brief introduction to Python and Jupyter notebooks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python\n",
    "\n",
    "Python is a high-level programming language.\n",
    "- It is *interpreted*, meaning that it is not compiled into executables, but rather executed directly from the source code by a program called an \"interpreter\".\n",
    "- It is a popular language for machine learning.\n",
    "- It is *very* slow in comparison to compiled languages.\n",
    "    - Many python libraries call C++ code, making them efficient.\n",
    "    - Efficient use of python leverages these library calls for anything compute-intesive.\n",
    "    - Even writing careful python, I've found that students (BS-PhD) produce Python code that is around 6x to 100x times slower than corresponding C++ code.\n",
    "- Python code is typically stored in `.py` files.\n",
    "- Common integrated development environments (IDEs, programs for writing and running python files) include Visual Studio Code (VSCode) and PyCharm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Jupyter Notebooks\n",
    "\n",
    "Jupyter notebooks were previously called iPython notebooks, when they were restricted to Python. They have since been extended to work with many different programming languages (within a single file!) and were thus renamed to Jupyter notebooks. However, the file type retains the old name: `.ipynb` for \"IPYthon NoteBook\".\n",
    "\n",
    "This document is a Jupyter notebook. It consists of a vertical stack of \"cells\". Each cell has a type, including \"markdown\" and \"Python\".\n",
    "\n",
    "You can edit a cell by double clicking on it. If you double click on this cell, you should see the raw markdown (a language for displaying text). You should see the type of the cell in the bottom right - in this case it should say \"markdown\" in the bottom right of this cell."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you click on the bottom right of the cell where it lists the cell type, you can change the cell to a different type. We will mainly use:\n",
    "- Markdown cells for displaying text.\n",
    "- Python cells for displaying *and running* code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you have finished editing a markdown cell, you can click the check mark in the top right of the cell to stop editing it. This renders the cell, and we say the cell is \"run\". You can also hit `ctrl+enter` to run any cell or `shift+enter` to run any cell and move to the next cell. If you hit `shift+enter` on the last cell, it will automatically create a new cell after the current one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Markdown Cells\n",
    "\n",
    "(Edit this cell to see the underlying formatting.)\n",
    "\n",
    "In markdown cells you can have **bold** or *italic* text. You can have inline code like `this` or code blocks like this:\n",
    "```\n",
    "print(\"Hello World!\")\n",
    "```\n",
    "You can have block quotes like this:\n",
    "> I am not a crook.\n",
    "\n",
    "You can have [links](https://google.com).\n",
    "\n",
    "You can have comments like this: <!-- This isn't rendered. -->\n",
    "\n",
    "You can have images like this: (commented out!)\n",
    "<!-- ![Wikipedia Logo](https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en.svg) -->\n",
    "\n",
    "If you want to control the width so it's not too big, you can have images like this (which sets the width to 400 pixels):\n",
    "<img src=\"https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-en.svg\" width=\"400\" alt=\"Wikipedia Logo\">\n",
    "\n",
    "You can make horizontal lines like this:\n",
    "\n",
    "---\n",
    "\n",
    "You can change the font color like <font color=\"red\">this</font>.\n",
    "\n",
    "You can have headers\n",
    "# Header 1\n",
    "## Header 2\n",
    "### Header 3\n",
    "#### Header 4\n",
    "\n",
    "You can make lists like this:\n",
    "- Dog\n",
    "- Cat\n",
    "    - Tabby\n",
    "    - Calico\n",
    "- Mouse\n",
    "Or like this:\n",
    "1. Apple\n",
    "2. Orange\n",
    "3. Pear\n",
    "\n",
    "You can include math like this $\\pi \\neq \\int_{-\\infty}^\\infty x^2 \\, \\text{d}x$. The language used to display math is called LaTeX. Most computer science papers are written using LaTeX. You can find a free (commonly used) LaTeX editor at https://www.overleaf.com/. VSCode only works with a restricted set of features from LaTeX, but enough to write basic equations.\n",
    "\n",
    "Here's another example that creates an \"align\" block in LaTeX, which aligns the & character on each line of the equation.\n",
    "$$\n",
    "\\begin{align}\n",
    "a =& b + c \\\\\n",
    "x =& y - z.\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "[And much much more](https://www.markdownguide.org/basic-syntax/)!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python Cells\n",
    "\n",
    "Python cells contain Python code, but are not automatically run. Python cells can be run by clicking the triangle \"run\" button in the top left of the cell or by using the `ctrl+enter` (run cell) or `shift+enter` (run cell and move to the next cell) commands.\n",
    "\n",
    "The first time that you run a cell, VSCode may prompt you for two things.\n",
    "\n",
    "1. \"Do you trust the authors of the files in this workspace?\" You are running the program in the notebook, so ensure you trust the source of the notebook.\n",
    "2. What \"Kernel\" would you like to use? This is asking what installation of python to use. Depending on your operating system, the current selection should be visible somewhere near the top-right or bottom-right of VSCode. It should say, for example, \"Python 3.1.1.7\". If you click this text, you can select different versions of python (or different virtual environments) to use.\n",
    "\n",
    "Once you have trusted the workspace and selected a python kernel, the python cell should run, showing the output below the code cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello World\n",
      "We can  print many things!  42\n"
     ]
    }
   ],
   "source": [
    "print(\"Hello World\")\n",
    "print(\"We can \", \"print many things! \", 42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Python Basics\n",
    "\n",
    "In Python:\n",
    "- Object types do not need to be specified.\n",
    "- Whitespace (tabs) are used to denote when if-statements, loops, function definitions, etc. end.\n",
    "- Packages are installed using commands like this, run in the command line:\n",
    "\n",
    "> pip install numpy\n",
    "\n",
    "Note that this will install numpy into your default Python installation. If you're using a different Python kernel in VSCode, you need to ensure that you install numpy (or the desired library) for that kernel!\n",
    "\n",
    "Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a is less than b\n",
      "We got here 1.\n"
     ]
    }
   ],
   "source": [
    "a = 10                              # Comments come after '#' symbols. Notice we didn't say the type of a\n",
    "b = 20                              # Semicolons are optional after lines, and usually not included!\n",
    "\n",
    "if a < b: \n",
    "    print(\"a is less than b\")\n",
    "    print(\"We got here 1.\")         # This is within the a<b clause, since indentation denotes code blocks, not { ... }\n",
    "elif a == b:                        # elif means \"else if\" \n",
    "    print(\"a is equal to b\")\n",
    "    print(\"We got here 2.\")\n",
    "else:\n",
    "    print(\"a is greater than b\")\n",
    "    print(\"We got here 3.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a is 10 and b is 20\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'a is 10 and b is 20'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Python has \"f-strings\", which stand for \"formatted string literals\", that make it easy to print variables:\n",
    "print(f\"a is {a} and b is {b}\")\n",
    "\n",
    "# Jupyter notebooks have \"display\" as an alternative to print.\n",
    "display(f\"a is {a} and b is {b}\")\n",
    "# Notice that it doesn't make much of a difference here, but for some structures (like Pandas DataFrames), it makes a big difference.\n",
    "# The output of display renders as html (e.g., tables and formatting), while print is meant to work in any console - only printing characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "1\n",
      "2\n",
      "3\n",
      "4\n"
     ]
    }
   ],
   "source": [
    "# Python has while loops:\n",
    "# While loop\n",
    "count = 0\n",
    "while count < 5:\n",
    "    print(count)\n",
    "    count += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0\n",
      "1\n",
      "2\n",
      "3\n",
      "4\n"
     ]
    }
   ],
   "source": [
    "# It also has for loops, but not like C++ or Java\n",
    "for i in range(5):  # range(5) generates numbers from 0 to 4\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "apple\n",
      "2.7\n",
      "cherry\n"
     ]
    }
   ],
   "source": [
    "# Python does NOT have \"array\" as a type! Instead, it has lists. These can hold different types of elements in each cell.\n",
    "# For loop with a list\n",
    "fruits = [\"apple\", 2.7, \"cherry\"]\n",
    "for fruit in fruits:\n",
    "    print(fruit)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Basic Types in Python\n",
    "\n",
    "Python has several built-in data types that are commonly used. Here are some of the basic types:\n",
    "\n",
    "1. **Integer**: Whole numbers without a fractional part. Example: `5`, `-3`, `42`\n",
    "2. **Float**: Numbers with a decimal point, representing real numbers. Example: `3.14`, `-0.001`, `2.0`\n",
    "3. **String**: A sequence of characters used to store text. Example: `\"Hello\"`, `'World'`, `\"1234\"`\n",
    "4. **Boolean**: Represents `True` or `False` values, often used in conditional statements.\n",
    "5. **List**: An ordered, mutable collection of items. Example: `[1, 2, 3]`, `['a', 'b', 'c']`\n",
    "6. **Tuple**: An ordered, immutable collection of items. Example: `(1, 2, 3)`, `('a', 'b', 'c')`\n",
    "7. **Dictionary**: A collection of key-value pairs. Example: `{'name': 'Alice', 'age': 25}`\n",
    "8. **Set**: An unordered collection of unique items. Example: `{1, 2, 3}`, `{'a', 'b', 'c'}`\n",
    "\n",
    "Each type serves a different purpose and has different characteristics and methods associated with it.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Execution Order\n",
    "In a Jupyter Notebook, you can run all of the cells above the current Python cell by clicking the play arrow with an upwards-pointing arrow next to it in the top-right of the Python cell (this isn't visible in markdown cells). At the top of VSCode there is also a button to run all cells in the notebook.\n",
    "\n",
    "Cells run in-order. However, you can manually run cells out of order. Notice that the cell below doesn't run properly:\n",
    "\n",
    "(Note: To ensure that the notebook runs all the way through when you hit run-all, this cell is commented out for now. Uncomment it and try running it. When you're done with it, comment it out again.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# result = example_function(a, b)\n",
    "# print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, run the code below, which shows how a function is defined in Python. Notice that the types (int, Boolean, string, etc.) of arguments and returned variables are not specified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function Definition\n",
    "def example_function(param1, param2):\n",
    "    # Function body\n",
    "    result = param1 + param2\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now go up and run the cell that failed again. Notice that now it works! This is because we executed the cells in a different order, and have now defined exmaple_function.\n",
    "\n",
    "It's recommended that you design your code to run in-order, but it's worth understanding how executing cells out of order can change the state of the Python interpreter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Classes in Python\n",
    "\n",
    "Below is an example of how a class is defined in Python. The `__init__` function defines the constructor. Member functions of the class have the argument `self`. You don't actually pass the object as this input, rather calling `foo.myFunc()` automatically pre-appends `foo` as the first argument of the function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating a new object (class instance)\n",
    "class ExampleClass:\n",
    "    # Constructor\n",
    "    def __init__(self, value):\n",
    "        self.value = value\n",
    "\n",
    "    def display_value(self):\n",
    "        print(self.value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice in the code above that member variables of a class are not pre-specified. By assigning to a variable that does not yet exist, here `self.value`, we are adding a member variable `value` to the ExampleClass object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, World!\n"
     ]
    }
   ],
   "source": [
    "# Creating an instance of ExampleClass\n",
    "example_object = ExampleClass(\"Hello, World!\")  # Notice that we don't call ExampleClass(example_object, \"Hello, World!\"), but that is effectively what happens.\n",
    "example_object.display_value()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Includes and Imports\n",
    "\n",
    "In C++ you write:\n",
    "```\n",
    "#include <iostream>\n",
    "```\n",
    "\n",
    "In Java you write:\n",
    "```\n",
    "import java.util.Scanner\n",
    "```\n",
    "\n",
    "In Python you write:\n",
    "```\n",
    "import math\n",
    "```\n",
    "\n",
    "Or, if you want one specific function:\n",
    "```\n",
    "from math import sqrt\n",
    "```\n",
    "\n",
    "Or, if you want to use the math library but don't want to type out \"math\" every time, you can give a shorter name:\n",
    "```\n",
    "import math as mth\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.0\n"
     ]
    }
   ],
   "source": [
    "import math\n",
    "print(math.sqrt(16))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.0\n"
     ]
    }
   ],
   "source": [
    "from math import sqrt\n",
    "\n",
    "print(sqrt(16))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.0\n"
     ]
    }
   ],
   "source": [
    "import math as mth\n",
    "\n",
    "print(mth.sqrt(16)); # This is a silly exmaple, but for long library names this can save a lot of space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4.0\n"
     ]
    }
   ],
   "source": [
    "# Like any other code, import statements from prior cells persist!\n",
    "print(sqrt(16)) # This uses from math import sqrt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numpy\n",
    "\n",
    "Numpy is a common library used for numerical computing. It is mainly used for it's `ndarray` object, which represents multi-dimensional arrays.\n",
    "\n",
    "Here's an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original array: [1 2 3 4 5]\n",
      "Array plus 10: [11 12 13 14 15]\n",
      "Array squared: [ 1  4  9 16 25]\n",
      "Mean of the array: 3.0\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Creating a NumPy array\n",
    "arr = np.array([1, 2, 3, 4, 5])\n",
    "\n",
    "# Basic operations\n",
    "arr_plus_10 = arr + 10  # Add 10 to each element\n",
    "arr_squared = arr ** 2  # Square each element\n",
    "\n",
    "# Displaying the results\n",
    "print(\"Original array:\", arr)\n",
    "print(\"Array plus 10:\", arr_plus_10)\n",
    "print(\"Array squared:\", arr_squared)\n",
    "\n",
    "# Applying a mathematical function\n",
    "mean_value = np.mean(arr)\n",
    "print(\"Mean of the array:\", mean_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing Arrays\n",
    "\n",
    "Python allows you to specify sub-arrays in a convenient manner. In the example below we create a 2-dimensional array (a matrix). We can access this with `arr_2d[i,j]` to get the element in the i'th row and j'th column. We can get sub-arrays by specifying ranges of values for i and h. \n",
    "\n",
    "Note that `:` means \"all incices\".\n",
    "\n",
    "Here are some examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original Array:\n",
      " [[1 2 3]\n",
      " [4 5 6]\n",
      " [7 8 9]]\n",
      "Selected Columns:\n",
      " [[2 3]\n",
      " [5 6]\n",
      " [8 9]]\n"
     ]
    }
   ],
   "source": [
    "# Creating a 2D NumPy array\n",
    "arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
    "\n",
    "selected_columns = arr_2d[:, 1:3] # Get all rows, and columns 1 to 3\n",
    "\n",
    "print(\"Original Array:\\n\", arr_2d)\n",
    "print(\"Selected Columns:\\n\", selected_columns)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whoa! Notice that columns 1:3 resulted in only two columns! What's going on?\n",
    "\n",
    "The notation `i:j` means to take columns `i` through `j-1`. This convention makes it easier to reference elements when you know the length of an array. Using `0:n` for a length `n` array will give all elements, `0` to `n-1`. In the example above `1:3` includes the middle and last columns (indices 1 and 2), but not the first (index 0).\n",
    "\n",
    "We can also index backwards from the end of an array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The last element is 5.\n",
      "Original Array: [1 2 3 4 5]\n",
      "All but last element: [1 2 3 4]\n"
     ]
    }
   ],
   "source": [
    "# Create a 1D NumPy array\n",
    "arr_1d = np.array([1, 2, 3, 4, 5])\n",
    "\n",
    "# Print the last element\n",
    "print(f\"The last element is {arr_1d[-1]}.\")\n",
    "\n",
    "# Using [:-1] indexing to select all elements except the last one\n",
    "all_but_last = arr_1d[:-1]\n",
    "\n",
    "print(\"Original Array:\", arr_1d)\n",
    "print(\"All but last element:\", all_but_last)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use an array of Booleans to index into an array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "indices =  [False False False False  True  True  True]\n",
      "above_threshold =  [20 25 30]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Creating a NumPy array\n",
    "arr = np.array([1, 5, 10, 15, 20, 25, 30])\n",
    "\n",
    "# Define the threshold\n",
    "threshold = 15\n",
    "\n",
    "# Get indices above threshold\n",
    "indices = arr > threshold   # This is an array of Booleans\n",
    "print(\"indices = \", indices)\n",
    "\n",
    "# Get the corresponding values\n",
    "above_threshold = arr[arr > threshold]\n",
    "print(\"above_threshold = \", above_threshold)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Types\n",
    "\n",
    "We can get the type of an object using the `type` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'int'>\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "int"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Print the type of a - an integer from the start.\n",
    "print(type(a))\n",
    "\n",
    "# Notice that here display is nicer:\n",
    "display(type(a))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'>\n"
     ]
    }
   ],
   "source": [
    "display(type(indices))    # This tells us its a numpy ndarray, but not what is in the array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('bool')"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(indices.dtype)    # This tells us what is inside the numpy array."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}